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ABSTRACT 

This paper presents a VLSI architecture for a three-operand binary adder. The proposed design is 
based on a carry-select adder (CSLA) and Han-Carlson (HCA) adder. Carry-select adder is known 
for its high speed and low power consumption. The architecture uses a novel carry-in selection 
scheme that reduces the number of logic gates required for carry generation. Additionally, Han- 
Carlson (HCA), a parallel prefix two-operand adder, can also be used for three-operand addition, 
significantly reducing the critical path delay at the cost of supplementary hardware. In addition to 
perform the three-operand binary addition with significantly less space and low power consumption, 
a novel high-speed and area-efficient adder architecture is suggested. This architecture uses pre- 
compute bitwise addition followed by carry-selection computation logic. The design also includes a 
parallel processing unit that allows the efficient computation for multiple operands of multiple results 
simultaneously. The proposed architecture has been implemented in Verilog HDL and synthesized 
using a 32nm CMOS technology. The simulation results show that the design achieves high 
performance with low power consumption. Additionally, the proposed design can be easily scaled to 
handle larger operands without sacrificing performance or area overhead. When compared to the 
current three-operand adder approaches, the suggested adder achieves the lowest ADP and PDP. 
Keywords: Carry-Select Adder, Han-Carlson, Verilog HDL, Three-Operand, Binary Adder. 


I. INTRODUCTION 

In all digital applications, adder circuits are required. In every VLSI circuit, the crucial factors are 
area, power, and delay. Addition and multiplication are the most basic requirements for high- 
performance CPUs. Adders are the fundamental building blocks of all circuit operations. As a result, 
a basic adder circuit must be designed that uses minimal power and occupies small area, yet there is 
always a tradeoff between power and delay. The speed of operation of the adder circuit is determined 
by the time it takes for the carry to propagate. As a result, in any circuit, the generation of the carry 
defines the speed limit. Adders are utilized for more than just addition; they're also employed for 
subtraction, multiplication, and division. As a result, efficient adders should be developed for high- 
speed applications. The adder's efficiency is assessed by looking at the circuit's area, power, and 
delay. Data path consumes nearly one-third of the total power. Adders are an important part of this 
data flow. As a result, power consumption can be drastically lowered by designing an efficient adder. 
The electronic gadgets that make up the heart of digital circuits must dissipate very low power in 
order to save battery life. 

Arithmetic operations are at the heart of the data stream. For high-speed applications, these arithmetic 
operations should be done as quickly as possible. These techniques can be utilized to speed up the 
operations of difficult circuits once they have been done at a high speed. As a result, creating an 
efficient adder is at the top of the priority. Several other characteristics, such as wire length and the 
amount of fan outs, are also taken into account as a result of technological advancements. The 
performance of the circuits is directly impacted by designing an adder with the lowest amount of area 
and delay. 
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The complexity of today’s design implies a significant increase in the speed and power consumption 
of Very Large Scale Integrated (VLSI) chip. The speed of VLSI chip considerable research work has 
been done by using different design techniques. Manual power optimization is slow and may contain 
error, due to the complexity of modern IC technology. Hence CAD tools are mandatory for the design 
of VLSI circuits. The key success behind the VLSI design lies in the use of CMOS technology. The 
CMOS technology is mostly adopted due to its low power consumption and noise scaling properties. 
The scaling properties help in the reduction of feature size of the transistors. The 

circuit designer uses an EDA tool for optimizing the architecture. According to Moor’s law, the 
number of transistors in the chip doubles approximately every eighteen months. In addition, the 
technology development allows the building of more and more complex circuits on a single chip 
which can operate at a high clocked frequency rate. 

The Carry Select Adder (CSA), which is considered to be one of the most efficient adders currently 

on the market, is utilised by the vast majority of data processing systems to perform mathematical 
and logical operations. The CLA has a shorter delay but a greater area to cover, whereas the RCA 
covers a smaller area but has a longer delay. As a result, there is potential for lowering the amount of 
space required by the CSA architecture by the implementation of an add-one logic scheme. The CSA 
comprises of a block of RCAs and the multiplexers. Hence, in comparison to the other adder 
topologies, demands a large amount of space in the hardware. The most important benefit of the CSA 
is that it performs the addition operation simultaneously under the assumption that the carry input is 
zero for one set of RCA and that it is one for another set of RCA. This allows the CSA to work more 
efficiently. It is presumed that the choose line for each and every multiplexer is the input carry itself. 
The CSA architecture's quick operation is possible because it presumes the requirement for Carry in 
(Cin) to be 1 and 0 (which is predefined), which allows for a more expedient computation of the total 
output. Because this architecture makes use of both of the RCA sections, it demands a large amount 
of space in the hardware. In order to circumvent the issue at hand, the constant multiplier makes use 
of the modified CSA for the expansion operation. 
The RCA is a logic circuit designed using the group of full adders. The RCA adder likewise is used 
for adding the N-bit numbers. The addition of the partial product term, the proposed 2-bit BCSE 
constant multiplier uses the buffer based full adder. The buffer based full adder helps in reduction of 
delay and a power delay product of the constant multiplier. 


II. LITERATURE SURVEY 

Mehedi Hasan et al. (2020) [1] used the Cadence Computer Aided Design tool to design and construct 
an ultrahigh hybrid full adder cell based on GDI and CCMOS logic in a 45 nm CMOS process. Seyed 
ErfanFatemieh et al. (2021) suggested [2] a low-power, location, and high-performance parameter 
estimation full adder based on static CMOS, with an energy improvement of up to 72%percent and a 
4-bit carry ripple adder structure. 

Spoorthi (2021) examined [3] entire adder solutions in technologies, in addition to their energy and 
latency. Pravitha et al. (2020) employed [4] Effective Charge Recovery Logic to design an adiabatic 
adder. Using a 22 nm CMOS hybrid full adder, Keerthana et al. (2020) created [5] a hybrid one-bit 
full adder cell that operates at 0.8V supply voltage. 

Basava Raj et al. (2021) proposed [6] the creation of a one-bit low-power hybrid adder circuit that 
combines Transmission Gate Logic with additional logic from direct polarisation and reverse 
polarisation methods of complementary metal oxide semiconductors. The technology used was 180 
nm, and the power source was 1.8 V. Jyoti Kandpal and colleagues (2021) presented [7] a 20- 
transistor hybrid full adder with pass transistors logic and transistor logic logic for high-performance 
arithmetic applications and compared it to earlier full adder architectures. Majid Amini Valashani et 
al. (2018) [8] came up with the idea for a brand-new 18T full adder that is low-power and efficient 
with regards to energy use. 

A comprehensive adder was created by Mehedi Hasan and colleagues (2020) utilising [9] the XOR 
and XNOR modules. Evaluation and comparison with a conventional full adder architecture are done 
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for silicon area. The XOR gate uses three transistors, and the CMOS complete adder uses two 3T 
XOR and one 2T Mux. The Somashekhar Ma lipatil et al. completed [10] adder was created in 2020 
and utilises eight transistors. 

Due to the fact that it uses many pairs of RCA to generate the carry and sum output, the conventional 
carry choose adder is not space efficient. Using gate level optimization, Singh & Kumar (2011) 
updated [11] the CSA to consume less space and energy. To save space and power while minimising 
the delay, binary to excess 1 converter are used in place of the RCA in conventional carry select 
adders. 

In 2012, Wey et al. suggested an area-efficient [12] CSA that makes use of the widely used Boolean 
logic technique. When compared to the traditional CSA approach, the common Boolean logic method 
replaces the RCA structure with a carry input of 0. The drawback of this approach is that it delays. 
The RCA structure approach and the ordinary Boolean logic method both evaluate the same delay. 
Vinodkumar et al. (2015) developed [13] a low power, high speed CSA for VLSI applications in 
which the ultimate carry output was chosen before the total output was determined and the efficiency 
of operation by the adders deemed unneeded were deleted. When compared to the typical CSA, the 
area and energy requirements were lower. 

Archana et al. (2014) [14] produced an RCA that has a low power consumption yet a high 
transmission speed. The adders were developed to reduce the amount of power that was consumed as 
well as the delay in transmission. The GDI structure and the Hybrid CMOS logic style were utilised 
so that the RCAs could achieve both high speed and low power consumption. There are many 
different contexts in which the hybrid logic method can be effective. The diffusion input multiplexer 
full adder was developed as an alternative to the traditional XOR/XNOR full adder. High-speed 
operation with minimal power consumption was demonstrated once the GDI MUX full adder was 
incorporated into the RCA framework. 

Using a CSLA with a Zero Finding Circuit, Kandula et al.(2016) presented their findings (ZFC). The 
design is implemented without a MUX during the final stage of the CSLA; this is a substantial 
departure from the models that have been used previously. The design results in a significant decrease 
of area as a direct consequence of the elimination of the selection of sum and carry-out in an individual 
block. 


Il. METHODOLOGY 


Most Significant Part 


Han-Carlson Adder 


Sum MSP Sum LSP 


Fig 1. Proposed System Architecture 


During the first stage (bit-addition logic), the bitwise addition of threen-bit binary input operands is 
carried out using an array of full adders. Each full adder computes the “sum(S;)” and “carry(cyi)” 
signals, as shown in the emphasized portions of the sentence. 


Si = ai Ọbi ®ci, 
cy; = Gb +bi -ci +cCi -ai 
Ga = G= S; *CYi-1> Go:0 = Go = So - Cin 
Pii = Pi = S; cyii, Po. = Po = Sy ® Cin 
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Si = (Pi ® Gi-1:0), So = Po, Cout = Gn:0 


A. Most Significant Part 

Han-Carlson Adder: 

The design of very large scale integrated circuits (VLSI) with low area and fast speed has become 
an important concern for the designers of computers. The rapidly advancing technology in versatile 
correspondences and registers has increased interest in creating region-specific VLSI strategy. The 
design of mobile phones is currently evolving to incorporate thinner and more refined profiles. As a 
result of the smaller size of the battery, the adaptable method has access to a constrained amount of 
force. This is the reason why planners are facing greater challenges such as high throughput and small 
silicon area. So, in order to make a productive zone, ER cells that have a high effectiveness are of 
amazing relevance. The size of the semiconductor is reduced to the profound submicron region and 
is often managed by measure designing in order to reduce the amount of space that is required for PC 
circuits. Many various sorts of adder models have been developed and implemented in order to cut 
down on the quantity of electricity that is being utilized. Minimizing rationale results in increased 
strategic throughput, just like increasing field productivity does. CSLA has replaced in order to 
achieve greater velocities. CSLA is utilised to eliminate the problem of delay in the transmission of 
conveying by first building more than one vehicle independently for PC method, and then selecting 
convey to discover aggregate. Nonetheless, CSLA is ineffective due to the fact that there are 
numerous sets of RCAs that produce partial aggregates and also keep input Scene = 0 and Scene = 1, 
final totals, and are chosen with the aid of convey multiplexers. 

Because it possesses the most extreme fan-out of either O or 2 of fan-outs, the Han-Carlson prefix 
tree is constructed in a manner that is analogous to that of the Cogston stone. Altering the pseudo- 
code used for the generation of con-stone allows for the straightforward creation of the Han-Carlson 
prefix tree. primary contrast is in all legal levels, and Han-Carlson prefix tree cells are keeping track 
of any remaining and missing vehicles on the last remaining rational level. The fan-out, number of 
dark cells, and number of reasoning levels in the Han-Carlson adder are all balanced to a satisfactory 
degree. As a result, in regions of low energy consumption associated with the Han-Carlson dollar, the 
Kogg-Stone adder is able to achieve quicker execution than the dollar. As a result, it is possible to 
carry out an extremely speculative Han-Carlson adder. 

B. Least Significant Part 

Low-Cost Carry Generation & Selection Carry Adder (LCCGSCA): 

The Low-Cost CGSCA (LCCGSCA) succeeds in achieving a smaller area than the conventional 
adder while maintaining the same latency. In addition to this, its power consumption is lower than 
that of currently available adders. The simplification used to create the low-cost CGS is based on the 
fact that, in practise, a 2-to-1 MUX is not implemented by different gates such as AND-OR or three 
NAND gates because an optimised low-area MUX can be found in many synthesis libraries. This is 
due to the fact that these libraries contain the optimised low-area MUX in their collections. 


a= > ov TI P(k) +c JPO 
c(i) = G(i) + P(i).c(i— 1);e(-1) = Cn 
i) = 3 co). TT re) 

c(i) =G(i) + P(i).c°(i—1);c°(—1) =0 


IV. RESULTS AND DISCUSSION 
In this portion of the paper, the suggested system will be evaluated using theoretical estimation and 
synthesis results in order to determine how it stacks up against other designs that are considered to 
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be state-of-the-art. It should be brought to everyone's attention that the accuracy of every design has 
been validated by running the appropriate simulations. 


A. EXISTING SYSTEM: 


u0/cout41 (u0/N31) 

u0/cout31 (u0/N2) 

u0/cout21 (u0/N1) 

u0/coutll (u0/N01) 
) 


POWER 


the quiescent amber, ia XPower are based oo mesverements of real designs with active fewctioeal 
T| | elements reflecting real world design scenarios. 


Quiescent Voco33 330V: 


Fig 4. Power Supply for Existing System 
SIMULATION 


Fig 5. Simulation Results for Existing System 
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B.PROPOSED SYSTEM: 
DELAY 
TIMING REPORT 
NOTE: THESE TIMING NUMBERS ARE ONLY A SYNTHESIS ESTIMATE. 
FOR ACCURATE TIMING INFORMATION PLEASE REFER TO THE TRACE REPORT 
GENERATED AFTER PLACE-and-ROUTE 
fore clock: No path found 
after clock: No path found 
Fig 7. Area for Proposed System 
POWER: 
XPower and Datasheet may have some Quiescent Current differences. This is due to the fact that ĝi 
the quiescent numbers in XPower are based on measurements of real designs with active functional 
elements reflecting real world design scenarios. 
Power summary: I(mA) P(mW) 
Total estimated power consumption: 55 
Vecint 1.80V: 
Veco33 3.30V: 
Signals: 
Quiescent Veccint 1.80V: 10 
Oniescent Veco33 3.30V: 2 
NUM [2s50epq208-7 
Fig 8. Power Supply for Proposed System 
V. CONCLUSION 


A three-operand binary adder's VLSI architecture is presented in this research. A unique high-speed 
and space adder design is suggested in as well as performing the three-operand binary addition using 
considerably fewer space and low power consumption. The design also includes a parallel processing 
unit that allows the efficient computation for multiple operands of multiple results simultaneously. 
The proposed architecture has been implemented in Verilog HDL and synthesized using a 32nm 
CMOS technology. The simulation results show that the design achieves high performance with low 
power consumption. Additionally, the proposed design can be easily scaled to handle larger operands 
without sacrificing performance or area overhead. The suggested adder achieves the lowest ADP and 
PDP compared to the other ways that use an adder with three operands that are currently in use. 
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